DOCUMENT RBSUHK 

ISr 338 672 TM 017 480 



AUTHOR 
TITLE 

INSTITUTION 

SPONS AGENCY 

RFJPOKT NO 
PUB DATE 
CONTRACT 
NOTE 

PUB TYP^ 



Bock, Ke Darren f Zimowski, Michel© 

Indiv.idualized Educational Assessment: Twelf tn-Grade 

Scienccj. 

Center for Research on Evaluation, Standards, and 

Student Testing, Los Angeles, CA* 

Office of Educational Research and Improvement (ED) , 

Washington, DC. 

CSE-TR-324 

Jun 91 

G0086-003 

lOp- 

Reports - Evaluative/FeasilDility (142) 



EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



MFOl/PCOl Plus Postage. 

Adaptive Testing; Computer Assisted Testing; 
«p;ducational Assessment; *Grade 12; High Schools; 
*High School Students? Individual Testing; Item 
Response Theory; Laboratory Procedures; Pilot 
Projects; Pretests Posttests; Questionnaires; 
^Science Tests? Scoring; Student Evaluation; «Test 
Construction; Test Items 

*Duplex Design; * Individualized Evaluation; 
Performance Based Evaluation 



ABSTRACT 

The goals, principles^ and methods of an 
individualized educational assessment c.re described as implemented in 
a 12th-grade science assessment instrument undergoing field trials in 
Ohio. Pilot tests were planned for December 1990 and March and April 
1991. The assessment design incorporates the duplex design of R. D. 
Bock and R. J. Misle'^y (1988) and two-stage testing made possibl'^ by 
computer technology. The first-stage test booklet, which is designed 
to be administered in February of the school year, consists of a 
student questionnaire and a 20-'item pretest to give a rough idea of 
the student *s level of science preparation. A second-stage form of 
the test, which is designed to be administered in late March or early 
April of the school year, is to be assigned to the student based on 
the level of preparation apparent. The duplex design instrument is 
intended for scoring on item response theory scales at the student 
and school levels « Laboratory performance tests have been developed 
and tested to form part of the student assessment. Currently, there 
are six exercises in each of four areas: genek^al science, biology, 
chemistry, and physics. Each exercise requires 80 minutes of 
laboratory work. Three references are listed, and one table presents 
the assessment content-by-process item classification. (SLD) 
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As originally conceived In the National Assessment of Educational Progress 
(NAEP), educational assessment was Intended to report educational outcomes at a 
high level of aggregation—average attainment for states, regions, or the nation. 
Using efficient techniques of multiple matrix sampling In which each student 
responds to only a limited number of Items randomly selected from a much larger 
set, NAEP attained high levels of generallzablllty for numerous educational 
objectives with relatively small demand on student time. When this technique was 
adapted to state-level assessments, as In the California Assessment Program, the 
reporting was extended to the level of separate schools, but there was no attempt to 
evaluate the attainment of individual students. 

Although t le state programs based on this conception served well the needs 
of policymakers, planners, and curriculum specialists, they did not sattsfy the 
requirements of school principals, teachers, and parents for Information that would 
guide and certify the progress of individual students. NeJiher did they motivate 
students by giving them a personal stake in the outcom*'. of their efforts on the 
assessment tests. To satisfy these additional demands on the testing program, many 
states adopted, at the expense of duplication of effort and further encroachment on 
classroom time, a two-tiered program Including matrix-sampled assessment testing 
and traditional student-level achievement testing. In an effort to serve the 
purposes of these overlapping testing program In a single, comprehensive 
assessment, the National Opinion Research Center (NORC) in 1986 began studies of 
a new approach to educational evaluation that combined features and benefits of 
school-level matrix sampling assessment and individual-level achievement testing. 



The NORC Assessment Design i'roject 

With support from the U.S. Office of Educational Research and 
Improvement, NORC developed and field-tested in the states of Illinois and 
California a new type of multipurpose assessment Instrument based on the "Duplex 
Design" of Bock and Mislevy (1988). The first trial of the design applied to eighth- 
grade mathematics and provided student-level scores In the content, areas of 
Number, Algebra, Geometry, Measurement, and Statistics, and in the process skill 
areas of Factual Knowledge, Conceptual Understanding, and Problem-solving; at the 
same time It provided school-level scores In 45 curricular objet '. ves in eighth-grade 
mathematics. The results of the field trials dearly demonstrated that ihe duplex 
instrument could provide detailed program evaluation as will as accurate scores for 
students, schools, districts and the state, all based on a testing session intermediate 
in length between matrix-sampled assessment and traditional achievement testir g 
(Bock & Zimowskl, 1989). 

Since August 1989, at the invitation of the National Scl«?nce Foundation and 
with continuing support from OERI, NORC has been Implementing another duplex 
design in the areas of Earth Science, Biology, Chemistry, ^nd Physics at the twelfth- 
grade level. Exploiting new technology for computer-controlled laser printing and 
optical character reading, this project has achieved a breakthrough in large-scale 
assessment technique by producing test forms individualized to tht course 
background and performance levels of each student taking the science test. These 
forms Include multiple-choice and open-ended essay questions. In addition, the 
assessment design incorporates a similarly individualized component of "hands-on" 
laboratory performance assessment in General Science, Biology, Chemistry, and 
Physics. The project has also led to the development of materials and procedures 
for a "Graded Mark-point" method of reliably scoring extended responses the open- 
ended and laboratory performance exercises. 

The present report describes the goals, principles, and methods of the 
individualized educational assessment as implemented in a twelfth-grade science 



assessment instrument now undergoing field trials in the state of Ohio. A pilot 
school in Ohio vnll be tested in the first week of December, 1990, and all twelfth- 
grade students i'l a stratified probability sample of 40 Ohio schools will be tested in 
March and April of 1991. A final report of the study is due on November 30, 1991. 



Two-stage Testing 

In addition to the duplex principle, the other major innovation of the 
NORC assessment design project is the practical implementation of two-itage 
testing. Since early in the development of item response theory (IRT), it has been 
known the ^ substantial reductions in testing time, without sacrifice of test reliability, 
can be obiained by a form of adaptive testing in which students are tested in two 
stages, where the second-stage test is selected to be maximally informative, given 
the student's score on the first-stage test. Studies by Lord (1980) showed that with 
second-stage forms representing at least four levels of difficulty, comparable 
reliability could be obtained in about one-third the testing time required for a 
conventional achievement test. 

For a long time thr logistics of two-stage testing were thought to be too 
complex to allow applications in large-scale assessment programs. Recently, 
however, two technological developments have radically changed this picture. One 
of these is the availability of high-capacity, programmable optical character readers 
used commercially in processing responses from direct-mail advertising promotions. 
The readers are capable of scoring test booklets duplicated by any printing method, 
rather than the high-precision printing previously required for scannable test 
booklets. The other development is that high-capacity laser printers driven by 
computers are now able to assemble the material to appear in a test booklet as the 
pages are printed. The NORC assessment design project has made use of this 
technology to implement large-scale two-stage testing in a practical way. 

First-stage Test 

The first-stage test booklet is designed to be administered in February. It 
consists of a student questio naire, asking for high school course history in Science 
and Mathematics, and a 20-item pretest with 5 items each in the areas of Earth 
Science, Biology, Chemistry and Physics. The items of the pretest are widely spaced 
in difficulty and give a rough estimate of the student's level of proficiency in these 
subjects. On the basis of a student's response to course background questionnaires 
and the score on the pretest, he or she is assigned a second-stage form adapted to an 
appropriate level of science preparation in each of the four areas. The 
questionnaire and test are designed to be administered to twelfth-grade students by 
teachers in the participating schools. The coirnleted test booklets are returned to 
NORC for scanning and analysis, and the results are used to control the generation 
of the second-stage forms appropriate for each student. Each such booklet is labeled 
clearly with the student's name on the cover, and each page of each booklet also 
carries optically readable numbers that identify the student and the items of that 
particular form. 

Second-stage Test 

The second-stage test is designed to be administered in late March or early 
April of the student's twelfth-grade program. The forms of the test are of two types, 
which can be administered separately or in combination: Type I consists of only 
multiple-choice items; Type II consists of multiple-choice items and open-ended 
items. 
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The Type I forms are further divided into a part A and part B, each consisting 
of 32 items. If the test is to be used to assign scores to students foi purposes of 
certification, it is recommended that each student be administered both part A and 
part B in 80 minutes of testing time. If the scores are to be used only to evaluate 
schools or programs, or to inform interested parties, each student may be randomly 
assigned a part A or part B, to be administered in 40 minutes of testing time. 

The Type II forms each contain 32 multiple-choice items and 4 open-ended 
items. Forty minutes of testing time is allocated to the multiple-choice items in the 
first half of the form, and 40 minutes for open-ended items in the second half oi 
the form. These forms are also divided into a part A and part B consisting of 16 
multiple-choice items and 2 open-ended items. These parts may also be 
administered separately if highly accurate student-level scoics are not required. 



Forms and Booklets 

Each second-stage test form, including part A and part B, is replicated in 
parallel six times. In addition, each form consists of four booklets constructed at 
each of four levels of difficulty: the lowest level is aimed at students who have only 
one course in secondr.ry-school science; the next two levels are aimed at students 
with at least two courses In science, with the lower level being assigned to those 
students who score below the median on the pretest and the higher level being 
assigned to students who score above the median; the highest level of difficulty is 
aimed at students with Advanced Placement courses of science. 

Because the second-stage forms are produced by computer r jntingent on 
information from the student questionnaires and pretest, the relative difficulty can 
be adapted to the type of course background of each student. For example, if a 
student has one course In earth sciences, two In biology, and one In chemistry, the 
biology content might be pitched at level three, the chemistry at level one, and the 
physics and earth sciences at level two. If a student has only one course In general 
science or earth sciences and biology, but has a reasonably good pretest score, the 
second-stage test will be pitched at level two in biology and earth sciences, but at 
level one in chemistry and physics. All possible profiles of student preparation can 
be accommodated by these computer-generated second-stage forms. 



Item Structure of the Second-slage Forms 

The content-by-process classification of items in the second-stage form is 
shown in Table 1, The table represents the items of the 64-item form. The open 
and cross-hatched entries represent one of the possible divisions of the form into 
part A ard part B. Other forms select item- and process-content for part A and part 
B in all possible combinations. Each test form is a random assignment of items 
classified according to the categories of Table 1. From a pool of 11,500 items, 24 test 
booklets have been constructed (6 forms at each of 4 levels of difficulty). Thus, the 
second-stage Instrument consists of stratified randomly parallel forms containing a 
possible 1,536 different items. 



IRT Scaling 

The instrument is based on a Duplex Design Intended for scoring on IRT 
scales in three directions. At the student level, scale scores can be computed for (a) 
each of the four content areas and (b) each of the four process categories, for a total 
of eight scales, plus an overall index of science achievement. At the school level, 



each of the 64 cells within the content-by-process classification can be assigned scale 
scores by (c) accumulating information over the 24 test booklets. 

For purposes of IRT scaling, the test booklets are connected by common 
items that link them in all three of these dimensions. There are link items with 
respect to content and process over the four levels of difficulty within forms for 
computing student scores, and with respect to the content-by-process elements 
from one form to another for scaling at the school or program level. These common 
linking items leCuce the total number of distinct items to 1,344. Computer 
procedures are employed to calculate scores on the linked scales and to generate 
student-level and school-level reports. 

For the forms containing multiple-choice and open-ended items, a student- 
level report for all four content and all four process 'limensions can be generated 
only if the student takes the complete 80-minute version. If the student takes only 
the 40-minute part A or part B, only two content areas can be scored and reported. 



The Laboratory Performance Tests 

Performance tests for twelfth-grade science have been developed and tested 
by Professor Rodney Doran, State University of New York at Buffalo. The exercises 
are based on principles formulated by Professor Pinchas Tamir, University of Tel 
Aviv, for the the Israeli Matriculation Examination. At present there are six 
exercises in each of four areas: General Science, Biology, Chemistry, and Physics. 
Each exercise requires 80 minutes of laboratory work. 

The exercises are designed to be administered by science teachers who set 
up the experiments using materials supplied by NORC. The instructions to the 
students are in written form consisting of a Part I, in which students are asked to 
design the expenment with these materials to answer certain specified questions, 
and Part II in which explicit instructions for the experiment are given and 
interpretative questions asked. Students keep a record of their work and write their 
conclusions on forms supplied with the instructions. 

For the Biology and Chemistry exercises students work in pairs assigned by 
NORC on the basis of course background and pretest scores; for Physics and General 
Science, students work individually. Any pair of students or individual student is 
assigned, by NORC, to one of the six exercises in each science area. Students are 
not assigned to exercises in Biology, Chemistry, or Physics unless they have had at 
least one full course In that subject. 

The student records are returned to NORC for scorit g by reading teams 
recruited from high school science teachers especially traJned for this work. These 
teams also score the open-ended items of ihe paper-and-pendl assessment. The 
ratings made by the readers are scaled by IRT methods developed for the California 
Direct Writing Assessment. 
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TABLE 1. 
Content-by-Process Item Classification 



Content 




Part A Q 


Part B (3 




i. rnysics 


♦♦♦♦♦ 




♦♦♦♦♦ 




iVi c cUli C 9 


Y 






V 

A 




Y 




Y 








Y 




V 
A 


WaveSi OpticS) and Sound 




X 


X 




2. Chemlstrv 




4t 4t 


4t 1^ 4t 4t 4t 




ine Atomic Model 


X 






X 


Chemical Reactions 


X 




X 








V 
A. 




A 


States of Matter | 




X 


X 




3. Biology 




***** 






of the Cell 


X 






X 


of the Organism 


X 




X 




Reproduction and Genetics 




X 




X 


Biological Diversity 




X 


X 




4. Earth Sciences 


♦♦♦♦♦ 








Space 


X 






X 


Air 


X 




X 




Water 




X 




X 


Lajid 




X 


X 






Knowledge of 


Knowledge of 


Understanding 




Prucesi 


Scientific 


Scientific 


of Scientific 


Problem 




Terminology 


Methods and 


Concepts and 


Solving 




and Facts 


Procedures 


Principles 
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